The Random Subspace Method for Constructing Decision Forests
نویسنده
چکیده
Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method’s superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.
منابع مشابه
Hybrid weighted random forests for classifying very high-dimensional data
Random forests are a popular classification method based on an ensemble of a single type of decision trees from subspaces of data. In the literature, there are many different types of decision tree algorithms, including C4.5, CART, and CHAID. Each type of decision tree algorithm may capture different information and structure. This paper proposes a hybrid weighted random forest algorithm, simul...
متن کاملStratified sampling for feature subspace selection in random forests for high dimensional data
For high dimensional data a large portion of features are often not informative of the class of the objects. Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features. In this paper we propose a stratified sampling method to select the feature subspaces for rand...
متن کاملAn Improved Random Forest Classifier for Text Categorization
This paper proposes an improved random forest algorithm for classifying text data. This algorithm is particularly designed for analyzing very high dimensional data with multiple classes whose well-known representative data is text corpus. A novel feature weighting method and tree selection method are developed and synergistically served for making random forest framework well suited to categori...
متن کاملOblique Random Forests for 3-D Vessel Detection Using Steerable Filters and Orthogonal Subspace Filtering
We propose a machine learning-based framework using oblique random forests for 3-D vessel segmentation. Two different kinds of features are compared. One is based on orthogonal subspace filtering where we learn 3-D eigenspace filters from local image patches that return task optimal feature responses. The other uses a specific set of steerable filters that show, qualitatively, similarities to t...
متن کاملExtensions to Quantile Regression Forests for Very High-Dimensional Data
This paper describes new extensions to the state-of-the-art regression random forests Quantile Regression Forests (QRF) for applications to high dimensional data with thousands of features. We propose a new subspace sampling method that randomly samples a subset of features from two separate feature sets, one containing important features and the other one containing less important features. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Pattern Anal. Mach. Intell.
دوره 20 شماره
صفحات -
تاریخ انتشار 1998